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Abstract: In this paper we introduce a novel method for linear system identification with 
quantized output data. We model the impulse response as a zero-mean Gaussian process 
whose covariance (kernel) is given by the recently proposed stable spline kernel, which encodes 
information on regularity and exponential stability. This serves as a starting point to cast our 
system identification problem into a Bayesian framework. We employ Markov Chain Monte Carlo 
(MCMC) methods to provide an estimate of the system. In particular, we show how to design a 
Gibbs sampler which quickly converges to the target distribution. Numerical simulations show 
a substantial improvement in the accuracy of the estimates over state-of-the-art kernel-based 
methods when employed in identification of systems with quantized data. 


1. INTRODUCTION 

Identification of systems from quantized data finds appli¬ 
cations in a wide range of areas such as communications, 
networked control systems, bioinformatics (see e.g. [Bae 
and Mallick, 2004] and [Wang et ah, 2010]). 

From a system identification perspective, identification 
of systems having quantized output data constitutes a 
challenging problem. In fact, the presence of a quantizer 
cascaded to a dynamic system, causes a loss of information 
on the behavior of that dynamic system. Thus, standard 
system identification techniques, such as least-squares or 
prediction error method (PEM) [Ljung, 1999], [Soderstrom 
and Stoica, 1989], may give poor performances. For this 
reason, in recent years several techniques for system identi¬ 
fication from quantized data have been proposed in a series 
of papers. Some of these methods are specifically tailored 
for identification of systems with binary measurements 
[Wang et ah, 2003], [Wang et ah, 2006], and are possibly 
implemented in a recursive fashion [Guo and Zhao, 2013], 
[Jafari et ah, 2012]. Other methods, such as [Colinet and 
Juillard, 2010], exploit the knowledge of a dithering signal 
to improve the identification performances. Specific input 
design techniques are studied in [Godoy et ah, 2014], 
[Casini et ah, 2011] and [Casini et ah, 2012]. Methods 
for handling general types of quantization of data have 
been proposed recently [Godoy et ah, 2011], [Ghen et ah, 
2012b]. In such contributions, the problem of identifying 
a linear dynamic system with quantized data is posed as 
a likelihood problem. In particular, in [Ghen et ah, 2012b] 
authors exploit the recently proposed Bayesian kernel- 
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based formulation of the linear dynamic system identifi¬ 
cation problem (see [Pillonetto et ah, 2014] for a survey). 

Similarly to [Ghen et ah, 2012b], the starting point of 
this paper is the formulation of the problem of identifying 
a linear dynamic systems with quantized data using a 
Bayesian approach. We model the impulse response of the 
unknown system as a realization of a Gaussian random 
process. Such a process has zero mean and its covariance 
matrix (in this context also called a kernel) is given by the 
recently introduced stable spline kernels, [Pillonetto and 
De Nicolao, 2010], [Pillonetto et ah, 2011], [Bottegal and 
Pillonetto, 2013] which are specifically designed for linear 
system identification purposes. The structure of this type 
of kernels depends on two hyperparameters, namely a scal¬ 
ing parameter and a shaping parameter, whose tuning per¬ 
mits more flexibility in the identification process and can 
be seen as a model selection step. In the standard setting 
(i.e. when there is no quantizer), kernel hyperparameters 
are chosen as those maximizing the marginal likelihood of 
the output measurements, obtained by integrating out the 
dependence on the system. Once the hyperparameters are 
chosen, the impulse response of the system is computed 
as the minimum mean-square Bayes estimate given the 
observed input/output data (see e.g. [Pillonetto and De 
Nicolao, 2010], [Ghen et ah, 2012a]). 

A key assumption in kernel-based methods is that the out¬ 
put data and the system admit a joint Gaussian descrip¬ 
tion. Such an assumption does not hold with quantized 
data and we need to think of a different approach. In this 
paper we propose a solution based on Markov Ghain Monte 
Garlo (MCMG) techniques [Gilks et ah, 1996]. To this end, 
we define a target probability density; the estimate of the 
system can be obtained by drawing samples from it. Such 
a probability density is function of the following random 
variables: 1) the (unavailable) non-quantized output of 
the linear system, 2) the scaling hyperparameter of the 



kernel, 3) the unknown measurement noise variance, 4) 
the impulse response of the system. The main contri¬ 
bution of this paper is to show how to design a Gibbs 
sampler [Gilks et ah, 1996] by exploiting the knowledge 
of the conditional densities of the target distribution. The 
main advantage of the Gibbs sampler is that it does not 
require any rejection criterion of the generated samples 
and quickly converges to the target distribution. Note that 
MGMG-based approaches have recently gained popularity 
in system identification [Ninness and Henriksen, 2010], 
[Lindsten et ah, 2012], [Bottegal et ah, 2014]. 

The paper is organized as follows. In the next section, 
we introduce the problem of the identification of dynamic 
systems from quantized data. In Section 3, we give a 
Bayesian description of the variables entering the system. 
In Section 4, we describe the proposed method for iden¬ 
tification. Section 5 shows some simulations to assess the 
performances of the proposed method. Some conclusions 
end the paper. 

2. PROBLEM STATEMENT 

We consider the following linear time-invariant BIBO 
stable output error system 

zt = ig* u)t + vt , (I) 

where {gt}, t € T is the impulse response characterizing 
the unknown system, which is fed by the input {ut}, t G 7'. 
The set T corresponds to either IR+ or depending 
on whether the system is continuous-time or discrete¬ 
time. The output Zf is corrupted by the additive white 
Gaussian noise Vt, which has zero mean and unknown 
variance cr^, and measured at the time instants t G I. 
If the system is continuous-time, then X can represent any 
non-uniform sampling, whereas in the discrete-time case 
we shall consider X = TL^ (i-e., no downsampling). Eor 
ease of exposition, in this paper we shall derive our algo¬ 
rithm in the discrete-time case only; the extension to the 
continuous-time is quite straightforward (see e.g. [Wahba, 
1990], [Pillonetto and De Nicolao, 2010]). Actually, the 



Fig. 1. Block scheme of the system identification scenario. 

output Zt is not directly measurable, and only a quantized 
version is available, namely 

Vt = Q[zt], (2) 

where Q is a known map (our quantizer) of the type 

Q[x]=pk if X G (qk-i, qk] , (3) 

with pk G {pi, .■.,Pq} and qk G {go, ■ ■ ■, qq} being 
known (and typically go = ~oo and gg = oo). 

Remark 2.1. A particular and well-studied case is the 
binary quantizer, defined as 

, / -1 if a: < C 

2N = |i (4) 

It is well-known that a condition on the threshold to 
guarantee identifiability of the system is C 0. In fact, 
when (7 = 0, the system can be determined up to a scaling 
factor [Godoy et ah, 2011]. 


Without loss of generality, let us assume the system to be 
strictly causal, i.e. go = 0. We assume that N input-output 
data samples gi, ..., uq, ..., uat-i are collected dur¬ 
ing an experiment. We formulate our system identification 
problem as the problem of estimating the impulse response 
g for n time instants, namely obtain {gt\^^i. Recall that, 
if n is sufficiently large, these samples can be used to 
approximate the dynamics of the systems with arbitrary 
accuracy [Ljung and Wahlberg, 1992]. Introducing the 
vector notation 
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the input-output relation for the available samples can be 
written 


z = Ug + v 

yt = Q[zt] ,t = l,...,N 

so that our estimation problem can be cast in, say, a “linear 
regression plus quantization” form. 


3. BAYESIAN MODELS FOR THE QUANTITIES OF 
INTEREST 


3.1 Establishing a prior for the system 

In this paper we cast the system identification problem 
into a Bayesian framework. Our starting point is the 
setting of a proper prior on g. Following a Gaussian 
regression approach [Rasmussen and Williams, 2006], we 
model g as a zero-mean Gaussian random vector, i.e. we 
assume the following probability density function for g: 

/3) ~ A/'(0, AAT/s), (5) 

where Kp is a covariance matrix whose structure depends 
on the value of the shaping hyperparameter f and A > 
0 is the scaling hyperparameter. In this context, Kp is 
usually called a kernel and determines the properties of 
the realizations of g. In this paper, we choose Kg from the 
family of stable .spline kernels [Pillonetto and De Nicolao, 
2010], [Pillonetto et ah, 2011]. Such kernels are specifically 
designed for system identification purposes and give clear 
advantages compared to other standard kernels [Bottegal 
and Pillonetto, 2013], [Pillonetto and De Nicolao, 2010] 
(like the quadratic kernel or the Laplacian kernel, see 
[Schdlkopf and Smola, 2001]). In this paper we make use of 
the first-order stable spline kernel (or TC kernel in [Ghen 
et al., 2012a]). It is defined by 

{Kg},,, := , 0 < /3 < 1 , ( 6 ) 

The above kernel is parameterized by fi, which regulates 
the decaying velocity of the generated impulse responses. 

3.2 Bayesian description of the non-quantized output 

Since we have assumed Gaussian distribution of the noise 
V, the joint distribution of the vectors z and g, given values 





of A, P and the noise variance cr^, is jointly Gaussian, 
namely 
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A, /?, cr' 




JZ ^zg 
gz A-ff^ 


where 


= XUKfiU'^ + cr^/jv 


( 7 ) 


( 8 ) 


and Y^zg = Ylgz = It follows also that the posterior 

distribution of g, given the knowledge of z, is Gaussian, 
namely 

p{g\z, X, P, =M{Cz, P) , (9) 

where 


P = 


U^U 


(Ai^/3) 


-1 


C = P 




( 10 ) 


In [Pillonetto and De Nicolao, 2010], an impulse response 
estimator is derived starting from (9). In fact, the min¬ 
imum mean-squared error (MMSE) estimate of g is (see 
e.g. [Anderson and Moore, 1979]) 

g = E[g\z,X, P,a‘^] = Cz. (11) 

Such an estimator depends on the kernel hyperparameters 
and the noise variance. A common strategy to choose 
the kernel hyperparameters is to maximize the marginal 
likelihood of 2 :, that is 

(A, P) = argmaxlogp( 2 ;lA, P) 

A, /5 

= argminlogdet E 2 -I-. (12) 

An estimate < 7 ^ of can be computed by means of the 
following steps: 

(1) Gompute the least-squares estimate of o, i.e. 

hs = {U^ur^u^z, 
in order to obtain an estimate of 5 ; 

(2) Gompute the empirical estimate of 


.2 _ jz-UgLs) jz-UgLs) 

^ N-n 


(13) 


(14) 


think of alternative ways to estimate the values of the 
hyperparameters. 

Let us denote by Ga(a, b) the Gamma distribution with 
parameters a and b. The following result, drawn from 
[Magni et ah, 1998], shows the marginal distribution of 
the inverse of the hyperparameter A given g and p. 
Lemma 3.1. The posterior probability distribution of A“^ 
given g and P is 

p(A~^|g, P) ~ Ga ^ ^ (18) 

Remark 3.2. To obtain the result of the above lemma, 
we have implicitly set an improper prior on A with non¬ 
negative support, i.e., p(A) = A“^x+(A), where X+(A) is 
the indicator function with support M“'". 

A similar argument holds for the noise variance cr^. Re¬ 
calling that V is Gaussian with variance equal to a^I and 
readapting Lemma 3.1 to this case, it turns out that 

p{a-'^\v) ~ Ga , (19) 

where also here we have assumed the improper prior 

p(cr2) = 

In some situations, e.g. when the quantizer has mild effects 
on the measured signal (e.g., when the resolution of the 
quantizer is high), a sufficiently reliable estimate of can 
be obtained by using (14), with 2 ; replaced by y. However, 
note that in general is not a consistent estimate of 

We conclude this section by recalling that, unfortunately, 
establishing a Bayesian model for P is still an unsolved 
problem. In this paper, we shall consider such an hyperpa¬ 
rameter as deterministic. A method for its choice will be 
discussed in the next section. 

4. PROPOSED SYSTEM IDENTIFIGATION 
METHOD 


Glearly, the system identihcation method described above 
is not applicable in our problem, since z is not available. 
However, we can draw some information on such a vector 
from the quantized output y. First, note that from (7) it 
follows that 

p{z\g, cr^) =Af{Ug, a'^l) . (15) 

Note also that, once g is given, (15) is independent of A 
and p. Let Ut denote the t-th row of U. Then, for each 
entry Zt it holds that (see e.g. [Bae and Maffick, 2004]) 

p{zt\g, cr^, yt =Pk) =Xfql:_^iUtg, cr^), (16) 

where cr^) denotes a Gaussian distribution trun¬ 

cated below a and above b, whose original mean and 
variance are p and cr^ respectively. Note that, for t 7 ^ j 
p{zt, z^g, cr^, y) =p{zt\g, y)p{zj\g, cr^, y), (17) 

due to the assumption on whiteness of noise. 

3.3 Bayesian deseription of hyperparameters and noise 
variance 

As mentioned in the previous subsection, knowing the 
values of hyperparameters is of paramount importance in 
kernel-based methods. The marginal likelihood maximiza¬ 
tion approach ( 12 ) is not applicable here, so we have to 


In this section we show how to exploit the Bayesian models 
introduced in the previous section to derive our system 
identification method. Assume for the moment that P is 
known and let us drop it from the notation below. The 
impulse response estimate g can be obtained by computing 
the following integral 

9 = J 9 P{z, X, a'^, g\y) dz dXda'^ dg , ( 20 ) 

where 

p{z, A, ( 7 ^ g\y) (21) 

denotes the joint distribution of the random quantities 
described in the previous section, given the quantized out¬ 
put y. The above integral can be computed using Markov 
Ghain Monte Carlo (MCMC) methods [Gilks et ah, 1996], 
by drawing a large number of samples from the distri¬ 
bution p{z, A, ( 7 ^, g\y) (usually called target distribution) 
and then averaging over g. In general, drawing samples 
from a distribution is a hard problem, if its probability 
density function does not admit a closed-form expression. 
However, if all the conditional probability densities of such 
a distribution are available in closed-form, the problem 
of sampling from the target distribution can be solved 
efficiently by resorting on a special case of the Metropolis- 
Hastings method, namely the Gibbs sampler (see e.g. 



[Gilks et al., 1996]). The basic idea is that each conditional 
random variable is the state of a Markov chain; then, by 
drawing samples from each conditional probability density 
iteratively, we converge to the stationary state of this 
Markov chain and generate samples of the target distri¬ 
bution. Here, the conditionals of (21) are as follows. 

(1) p{z\\, a^, g, y). As discussed in Section 3.2, once g is 
given, this conditional density does not depend on A. 
Moreover, due to the assumptions on noise, it density 
factorizes as follows 

N 

Y[p{zt\<T^, g,yt), (22) 

t=i 

where each of the factors is a truncated Gaussian 
according to (16). 

(2) p{X~^\z, (J^, g, y). Once g is given, A becomes inde¬ 
pendent of all the other variables, i.e. such condi¬ 
tional density becomes p{X~^\g), and corresponds to 
(18), namely a Gamma distribution with parameters 



(3) p{a'^\z, A, g, y). Once g and z are given, one can 

compute V = z — Ug. Recalling (19), it follows that 
this conditional density can be written as p(a‘^\z, g) 
and is distributed as a Gamma random variable with 
parameters (^, y 

(4) p{g\z, A, (T^, y). Given z, information carried by y 
becomes redundant and can be discarded, so that 
this conditional probability density corresponds to 
p{g\z, A, (J^). Its closed-form expression is given by 
(9), namely a Gaussian distribution with mean Cz 
and covariance matrix P (see (10)). 

Given the above conditional densities, we are in position 
to illustrate the proposed identification algorithm. 

Algorithm; Bayesian system identification with quan¬ 
tized output measurements 

Input: {ytlili, 

Output: 

(1) Initialization: Gompute initial values (t^’° and set 

P 

(2) For I = 1 to M: 

(a) Draw the sample zl, from p{zt\g''~^, j/t), 

(b) Draw the sample A* from p(A ^) 

(c) Draw the sample cr^’* from p{a‘^\g'^~^, z*) 

(d) Draw the sample g* from p{g\X, z*, ct^’®) 

(3) Gompute g = EZmo 9" 

In the above algorithm, the parameters M and Mq are 
introduced. M the number of samples to be generated; 
clearly, large values of M should guarantee more accurate 
estimates of g. Mq is the number of initial samples drawn 
from the conditional of g to be discarded and is also 
known as burn-in [Meyn and Tweedie, 2009]. In fact, the 
conditionals from which those samples are drawn are to 
be considered as non-stationary, since the Gibbs sampler 
takes a certain number of iterations to get close to a 
stationary distribution. 

Setting initial values of the Gibbs sampler The initial es¬ 
timate g^ can be computed using the kernel-based method 


introduced in [Pillonetto and De Nicolao, 2010] and briefly 
revisited in Section 3.2. Replacing z with y in (II), one 
can obtain a (very) rough estimate of (/, which can serve 
as initial condition for the Gibbs sampler. 

Similarly, the initial value a'^’^ can be computed from (14), 
again by replacing z with y. 

4.1 Estimation of the hyperparameter /? 

It remains to set a scheme for estimating /3. In [Chen 
et ah, 2012b], an exact marginal likelihood maximiza¬ 
tion approach is proposed, but it is shown that such an 
approach needs a solution of a complicated integral. In 
this paper, we adopy a simple (and approximate) way to 
estimate /3. It consists in maximizing the cost function 
of (12), where, instead of using the non-available data 
z, we plug the quantized output y. Clearly, one should 
not expect to get very good results in general, especially 
when the difference between z and y is high (e.g. when the 
quantizer is binary). However, numerical experiments have 
shown that the accuracy on the estimation of /3 with this 
strategy is satisfactory enough in order to obtain a good 
performance of the algorithm. 

4-2 Comparison with [Chen et al, 2012b] 

Although the methods proposed here and the method 
proposed in [Chen et ah, 2012b] exploit the same Bayesian 
modeling of the unknown system, the techniques used to 
carry out the estimate of g are substantially different. 
In [Chen et ah, 2012b], the impulse response is obtained 
as the maximum a posteriori (MAP) estimate given the 
quantized output data. Here instead, g is computed by 
means of (20), that is a minimum mean square error Bayes 
estimator. 

5. NUMERICAL EXPERIMENTS 

We test the proposed algorithm by means of 2 Monte 
Carlo experiments of 100 runs each. For each Monte Carlo 
run, we generate a linear system by picking 10 pairs of 
complex conjugate zeros with magnitude randomly chosen 
in [0, 0.95] and random phase. Similarly, we pick 10 pairs of 
complex conjugate poles with magnitude randomly chosen 
in [0, 0.93] and random phase. The goal is to estimate 
n = 50 samples of the impulse response from N input- 
output data. The inputs are realizations of white noise 
with unit variance. We compare the following estimators. 

• B-Q-GS: This is the method described in this paper, 
namely a Bayesian system identification method for 
Quantized output data that uses the Gibbs Sampler. 
The parameter M, denoting the number of samples 
generated by the sampler, is set to 3 • 10^. The first 
Mq = 10^ samples are discarded. The validity of 
the choice of M and Mq is checked by assessing 
that quantiles 0.25, 0.5, 0.75 are estimated with good 
precision [Raftery and Lewis, 1996]. 

• SS-ML: This is the nonparametric kernel-based method 
proposed in [Pillonetto and De Nicolao, 2010] and 
revisited in [Chen et ah, 2012a], which is not designed 
for handling quantized data. This method requires the 
estimation of the same parameters as our proposed 



method. The kernel adopted for identification is (6). 
Its hyperparameters are estimated using (12), while 
(T^ is estimated through (14) (in both cases replacing 
z with y). 

• LS: This is the least-squares estimator, where the 
data employed to estimate g are the quantized output 
measurements y. Note that, in principle, here the 
parameter n should be estimated from data using 
complexity criteria such as AIC or BIC [Ljung, 1999]. 
Here, for simplicity, we fix it to 50. 

• SS-ML-NQ: Same as SS-ML. However, here we make 
use of the non-quantized vector z. Hence, this esti¬ 
mator exploits information which is not available in 
practice in this problem. 

• LS-NQ: Least-squares estimator having access to the 
vector z. The parameter n is fixed as for the LS 
estimator. 


The performances of the estimators are evaluated by 
means of the fitting score, computed as 

FIT, = 1 - , (23) 

hi-gth 

where gi is the impulse response generated at the i-th run, 
gi its mean and gi the estimate computed by the tested 
methods. 


5.1 Binary quantizer 


The first experiment is on the following binary quantizer 

1 if a: > 1 
— 1 if a: < 1 


Qb[x] ■■= 


For each Monte Carlo run, the noise variance is such that 
= 10, i.e. the ratio between the variance of the 
noiseless (non-quantized) output and the noise is equal 
to 10. We generate N = 500 data samples. Figure 2 


5.2 Ceil-type quantizer 

In the second experiment we test the performance of our 
method on systems followed by a ceil-type quantizer, which 
is defined as 

Qc[x\ := [a;] . 

Again, for each Monte Carlo run, the noise variance is such 
that generate N = 200 data samples. 

As shown in Figure 4, in this case, if one compares the 
oracle-type methods (i.e. SS-ML-Or. and LS-Or.) with the 
same methods using quantized data (SS-ML and LS), the 
loss of accuracy is relatively low. This because this type of 
quantizer has a mild effect on the measurements. It can be 
seen, however, that the proposed method is able to give a 
fit that is comparable to the standard kernel-based method 
that uses non-quantized data (SS-ML-Or). Moreover, it 
outperforms the least-squares estimator equipped with the 
knowledge of non-quantized data. 
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Fig. 4. Box plots of the fitting scores for the ceil-type 
quantizer experiment. 
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6 . CONCLUSIONS 

In this paper, we have introduced a novel method for sys¬ 
tem identification when the output is subject to quantiza¬ 
tion. We have proposed a MCMC scheme that exploits the 
Bayesian description of the unknown system. In particular, 
we have shown how to design an integration scheme based 
on the Gibbs sampler by exploiting the knowledge of the 
conditional probability density functions of the variables 
entering the system. We have highlighted, through some 
numerical experiments, the advantages of employing our 
method when quantizers affect the accuracy of measure¬ 
ments. 


Fig. 2. Box plots of the fitting scores for the binary 
quantizer experiment. 

shows the results of the Monte Carlo runs. The advantage 
of using the proposed identification technique, compared 
to methods which do not account for the quantizer, is 
evident. Despite the large loss of information caused by 
the quantizer, the proposed method gives a fit which is 
quite comparable to the oracle methods. Figure 3 reports 
one of the generated scenarios. It can be seen that there 
is a substantial difference between y and z. Nonetheless, 
the accuracy of the estimation of the impulse response is 
acceptable. 


Important questions such as consistency of the method 
and robust selection of the kernel hyperparameter fi are 
currently under study. 
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